Quantifying the Informativeness of Similarity Measurements

نویسندگان

  • Austin J. Brockmeier
  • Tingting Mu
  • Sophia Ananiadou
  • John Yannis Goulermas
چکیده

In this paper, we describe an unsupervised measure for quantifying the ‘informativeness’ of correlation matrices formed from the pairwise similarities or relationships among data instances. The measure quantifies the heterogeneity of the correlations and is defined as the distance between a correlation matrix and the nearest correlation matrix with constant off-diagonal entries. This non-parametric notion generalizes existing test statistics for equality of correlation coefficients by allowing for alternative distance metrics, such as the Bures and other distances from quantum information theory. For several distance and dissimilarity metrics, we derive closed-form expressions of informativeness, which can be applied as objective functions for machine learning applications. Empirically, we demonstrate that informativeness is a useful criterion for selecting kernel parameters, choosing the dimension for kernel-based nonlinear dimensionality reduction, and identifying structured graphs. We also consider the problem of finding a maximally informative correlation matrix around a target matrix, and explore parameterizing the optimization in terms of the coordinates of the sample or through a lower-dimensional embedding. In the latter case, we find that maximizing the Bures-based informativeness measure, which is maximal for centered rank-1 correlation matrices, is equivalent to minimizing a specific matrix norm, and present an algorithm to solve the minimization problem using the norm’s proximal operator. The proposed correlation denoising algorithm consistently improves spectral clustering. Overall, we find informativeness to be a novel and useful criterion for identifying non-trivial correlation structure.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Institutional Ownership, Business Cycles and Earnings Informativeness of Income Smoothing: Evidence from Iran

Managers engage in income smoothing either to communicate private information about future earnings to investors (informativeness hypothesis) or to distort financial performance for opportunistic purposes (opportunism hypothesis). Business cycles and the monitoring role of institutional ownership may affect the earnings informativeness of income smoothing. The purpose of this research is to exa...

متن کامل

The Informativeness of Reported Earnings and Characteristics of the Audit Committee

An information usefulness approach to decision making points out that only the information is regarded as useful that will bring valuable messages to investors and lead to stock price adjustments. This study examines the effectiveness of audit committees in improving earnings quality and informativeness, particularly among family-owned firms. Earnings informativeness was measured through the re...

متن کامل

Measuring Term Informativeness in Context

Measuring term informativeness is a fundamental NLP task. Existing methods, mostly based on statistical information in corpora, do not actually measure informativeness of a term with regard to its semantic context. This paper proposes a new lightweight feature-free approach to encode term informativeness in context by leveraging web knowledge. Given a term and its context, we model contextaware...

متن کامل

Investigating the effect of stock price informativeness on labor investment efficiency

The Managerial learning hypothesis suggests that managers can learn the stock price informativeness of their stock company stock, which can help improve their decision-making efficiency. According to Managerial learning hypothesis, the stock price informativeness can affect the Labor investment efficiency, since stock prices contain valuable information that managers have about the company's fu...

متن کامل

Fusion of Similarity Data in Clustering

Fusing multiple information sources can yield significant benefits to successfully accomplish learning tasks. Many studies have focussed on fusing information in supervised learning contexts. We present an approach to utilize multiple information sources in the form of similarity data for unsupervised learning. Based on similarity information, the clustering task is phrased as a non-negative ma...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Machine Learning Research

دوره 18  شماره 

صفحات  -

تاریخ انتشار 2017